The World - Wide Web : Quagmire or Gold Mine ? Is information on the Web sufficiently structured to facilitate effective Web mining ?
ثبت نشده
چکیده
Skeptics believe the Web is too unstructured for Web mining to succeed. Indeed, data mining has been applied traditionally to databases, yet much of the information on the Web lies buried in documents designed for human consumption such as home pages or product catalogs. Furthermore , much of the information on the Web is presented in natural-language text with no machine-readable semantics ; HTML annotations structure the display of Web pages, but provide little insight into their content. Some have advocated transforming the Web into a massive layered database to facilitate data mining [12], but the Web is too dynamic and chaotic to be tamed in this manner. Others have attempted to hand code site-specific " wrappers " that facilitate the extraction of information from individual Web resources (e.g., [8]). Hand coding is convenient but cannot keep up with the explosive growth of the Web. As an alternative, this article argues for the structured Web hypothesis: Information on the Web is sufficiently structured to facilitate effective Web mining. Examples of Web structure include linguistic and typographic conventions , HTML annotations (e.g., ), classes of semi-structured documents (e.g., product catalogs), Web indices and directories, and much more. To support the structured Web hypothesis, this article will survey preliminary Web mining successes and suggest directions for future work. Web mining may be organized into the following subtasks: • Resource discovery. Locating unfamiliar documents and services on the Web. Is information on the Web sufficiently structured to facilitate effective Web mining?
منابع مشابه
A Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملPrediction of user's trustworthiness in web-based social networks via text mining
In Social networks, users need a proper estimation of trust in others to be able to initialize reliable relationships. Some trust evaluation mechanisms have been offered, which use direct ratings to calculate or propagate trust values. However, in some web-based social networks where users only have binary relationships, there is no direct rating available. Therefore, a new method is required t...
متن کاملConstruction of Web-Based, Service-Oriented Information Networks: A Data Mining Perspective - (Abstract)
Mining directly on the existing networks formed by explicit webpage links on the World-Wide Web may not be so fruitful due to the diversity and semantic heterogeneity of such web-links. However, construction of service-oriented, semi-structured information networks from the Web and mining on such networks may lead to many exciting discoveries of useful information on the Web. This talk will dis...
متن کاملAdaptive Information Analysis in Higher Education Institutes
Information integration plays an important role in academic environments since it provides a comprehensive view of education data and enables mangers to analyze and evaluate the effectiveness of education processes. However, the problem in the traditional information integration is the lack of personalization due to weak information resource or unavailability of analysis functionality. In this ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996